ENTPRISE: An Algorithm for Predicting Human Disease-Associated Amino Acid Substitutions from Sequence Entropy and Predicted Protein Structures.

نویسندگان

  • Hongyi Zhou
  • Mu Gao
  • Jeffrey Skolnick
چکیده

The advance of next-generation sequencing technologies has made exome sequencing rapid and relatively inexpensive. A major application of exome sequencing is the identification of genetic variations likely to cause Mendelian diseases. This requires processing large amounts of sequence information and therefore computational approaches that can accurately and efficiently identify the subset of disease-associated variations are needed. The accuracy and high false positive rates of existing computational tools leave much room for improvement. Here, we develop a boosted tree regression machine-learning approach to predict human disease-associated amino acid variations by utilizing a comprehensive combination of protein sequence and structure features. On comparing our method, ENTPRISE, to the state-of-the-art methods SIFT, PolyPhen-2, MUTATIONASSESSOR, MUTATIONTASTER, FATHMM, ENTPRISE exhibits significant improvement. In particular, on a testing dataset consisting of only proteins with balanced disease-associated and neutral variations defined as having the ratio of neutral/disease-associated variations between 0.3 and 3, the Mathews Correlation Coefficient by ENTPRISE is 0.493 as compared to 0.432 by PPH2-HumVar, 0.406 by SIFT, 0.403 by MUTATIONASSESSOR, 0.402 by PPH2-HumDiv, 0.305 by MUTATIONTASTER, and 0.181 by FATHMM. ENTPRISE is then applied to nucleic acid binding proteins in the human proteome. Disease-associated predictions are shown to be highly correlated with the number of protein-protein interactions. Both these predictions and the ENTPRISE server are freely available for academic users as a web service at http://cssb.biology.gatech.edu/entprise/.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prioritization of Deleterious Variations in the Human Hypoxanthine-Guanine Phosphoribosyltransferase Gene

ABSTRACT             Background and Objectives: Non-synonymous single nucleotide polymorphisms are typical genetic variations that may potentially affect the structure or function of expressed proteins, and therefore could be involved in complex disorders. A computational-based analysis has been done to evaluate the phenotypic effect of no...

متن کامل

Characteristics Determination of Rheb Gene and Protein in Raini Cashmere Goat

The aim of the present study was todeterminecharacteristics of Rheb gene and protein in Raini Cashmere goat. Comparative analyses of the nucleotide sequences were performed. Open reading frames (ORFs), theoretical molecular weights of deduced polypeptides, the protein isoelectric point, protein characteristics and three-dimensional structures was predicted using online standard softwares. The f...

متن کامل

Predicting the Functional Effect of Amino Acid Substitutions and Indels

As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including ...

متن کامل

Identification of Novel Mutations in IL-2 Gene in Khorasan Native Fowls

The intron-exon structure of Khorasan native fowl interleukin-2 (IL-2) was investigated. For this purpose, twenty chickens were selected from the Native Fowl Breeding Station of Khorasan province, and genomic DNA was extracted using a modified conventional DNA extraction protocol. An 875 bp fragment of IL-2 was successfully amplified, including a small part of the promoter, exon 1, intron 1, an...

متن کامل

Neighbor Preferences of Amino Acids and Context-Dependent Effects of Amino Acid Substitutions in Human, Mouse, and Dog

Amino acids show apparent propensities toward their neighbors. In addition to preferences of amino acids for their neighborhood context, amino acid substitutions are also considered to be context-dependent. However, context-dependence patterns of amino acid substitutions still remain poorly understood. Using relative entropy, we investigated the neighbor preferences of 20 amino acids and the co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PloS one

دوره 11 3  شماره 

صفحات  -

تاریخ انتشار 2016